-
Notifications
You must be signed in to change notification settings - Fork 63
chore: Add a function to traverse BFET and encode type usage #2390
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
bigframes/core/logging/data_types.py
Outdated
| curr_result = curr_result | _encode_type_refs_from_expr( | ||
| assignment[0], node.child | ||
| ) | ||
| elif isinstance(node, nodes.SelectionNode): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These node types are missed here: ReadLocalNode, ReadTableNode, ConcatNode, ExplodeNode, RandomSampleNode, FromRangeNode. You can find a full list of nodes in the compiler: https://github.com/googleapis/python-bigquery-dataframes/blob/main/bigframes/core/compile/ibis_compiler/ibis_compiler.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! I think only ExplodeNode applies here because it has deref field. There are no column-specific operations for all the other nodes, so I will leave them out.
Added the tracking of columns for ExplodeNode in the this change.
| curr_result = curr_result | _encode_type_refs_from_expr( | ||
| col_def.expression, node.child | ||
| ) | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we can explicitly handle all of existing nodes, we should throw an error for unknown new nodes. If we want to add new BF nodes, this can help us remember handling logs for new BF nodes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added explicit mentioning of nodes to be skipped, but I think it's risky to raise errors in the logging code path: the user might get blocked and confused by logging errors that has nothing to do with their business code, which led to bad experiences.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC, the BF node types are limited and all are known to us so customers won't see the errors. It owuld be untracked if logs are missed quietly. That's my thoughts but it's okay to leave it for your call. Thanks!
| def _encode_type_refs_from_expr( | ||
| expr: expression.Expression, child_node: bigframe_node.BigFrameNode | ||
| ) -> int: | ||
| # TODO(b/409387790): Remove this branch once SQLGlot compiler fully replaces Ibis compiler |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this throw any errors if we forget to remove this branch?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once the SQLGlot migration is complete, this branch will become a dead code. I don't think there's a need to raise an error because it does not affect any functionalities.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. Thanks!
Next step is to add the encoded usage to the job_config.label before SQL dispatch.
Related bug: b/406578908